%%html
<link rel="stylesheet" type="text/css" href="rise.css" />
Multi-dimensional data arrays¶
Digital representation of data¶
digital_signal = [0, 1, 4, 4, 4, 2, 2, 1, 2, 4, 5]
sample_interval = 1
Multi-dimensional data arrays¶
The vast majority of the data that you will work with will be stored as arrays of values in some number of dimensions.
Let's consider some examples.
1-D: Time series at regular sample intervals¶
1-D: DNA sequence¶
2-D: Grayscale image¶
3-D: Color image¶
2-D: EEG time series for multiple locations¶
3-D: EEG time series for multiple locations, trials¶
N-D: EEG time series for multiple locations, trials, subjects, conditions, ...¶
Hard to visualize, but easy to use mathematically and in code.
NumPy¶
Without NumPy we would NOT USE PYTHON for scientific computing or data analysis.¶
Here's a quick comparison of NumPy with Python lists to give you an idea of why it is so essential.
| Python lists | NumPy arrays |
|---|---|
| Very inefficient and slow for large arrays. | Highly optimized C code behind the scenes. |
| Nested lists are a nightmare for multi-dimensional data arrays. | N-dimensional array syntax is simple and easy. |
| Great for small arrays of arbitrary types of objects. | Only allows arrays of the same type of object. |
NumPy 1-D arrays¶
Learning goals
- You will be able to initialize arrays.
- You will be able to index/slice into arrays.
- You will be able to index/slice with logical masks.
- You will be able to compute array statistics.
- You will be able to do array math.
NumPy 1-D array initialization¶
NumPy arrays generally cannot change size, so you must create the array for the size you need.
import numpy
numpy.array([1, 2, 3])
array([1, 2, 3])
import numpy as np
np.array([1, 2, 3])
array([1, 2, 3])
np.ones(3), np.zeros(3), np.random.random(3)
(array([1., 1., 1.]), array([0., 0., 0.]), array([0.326398 , 0.7878623 , 0.90420253]))
Index and slice just like a Python list¶
data = np.array([1, 2, 3])
data
array([1, 2, 3])
data[0], data[1], data[0:2], data[1:]
(1, 2, array([1, 2]), array([2, 3]))
!!! A slice into a NumPy array can be assigned to a variable and it will still reference the original array data (not a copy)¶
myarray = np.array([1, 2, 3])
array_slice = myarray[1:]
# reference to data in myarray
array_slice
array([2, 3])
array_slice[0] = -8
array_slice
array([-8, 3])
myarray
array([ 1, -8, 3])
In contrast, assigning a slice of a Python list to a variable creates a copy¶
mylist = [1, 2, 3]
list_slice = mylist[1:]
# copy of data from mylist
list_slice
[2, 3]
list_slice[0] = -8
list_slice
[-8, 3]
mylist
[1, 2, 3]
This means you can pass a slice to a function and still mutate the original array!
myarray = np.array([1, 2, 3])
myslice = myarray[1:]
def change_array(arr):
arr[0] = 100
change_array(myslice)
myarray
array([ 1, 100, 3])
What if you want a copy of a slice into a NumPy array?
myarray = np.array([1, 2, 3])
ref_slice = myarray[1:]
copy_slice = myarray[1:].copy()
print(' ref = ', ref_slice)
print('copy = ', copy_slice)
ref = [2 3] copy = [2 3]
ref_slice[0] = -50
copy_slice[0] = 100
print(' ref = ', ref_slice)
print('copy = ', copy_slice)
ref = [-50 3] copy = [100 3]
What happened to myarray?
myarray
array([ 1, -50, 3])
NumPy also let's you slice with logical arrays!¶
data = np.array([1, 2, 3])
mask = data > 1
mask
array([False, True, True])
data[mask]
array([2, 3])
data[data != 2]
array([1, 3])
Array statistics are a breeze with NumPy!¶
data = np.array([1, 2, 3])
data.mean()
2.0
data.min(), data.max(), data.sum(), data.prod()
(1, 3, 6, 6)
# variance and standard deviation
data.var(), data.std()
(0.6666666666666666, 0.816496580927726)
np.mean(data)
2.0
Array math is a breeze with NumPy!¶
data = np.array([1, 2, 3])
ones = np.ones(3)
print(data + ones)
print(data - ones)
print(data * data)
print(data / data)
print(data**3)
[2. 3. 4.] [0. 1. 2.] [1 4 9] [1. 1. 1.] [ 1 8 27]
2 * (data + 1)
array([4, 6, 8])
Some useful array functions: arange, linspace, logspace¶
# start, stop, step
np.arange(2, 12, 2)
array([ 2, 4, 6, 8, 10])
# 5 values from 0 to 1
# evenly spaced on a linear scale
np.linspace(0, 1, 5)
array([0. , 0.25, 0.5 , 0.75, 1. ])
# 5 values from 10^-1 to 10^1
# evenly spaced on a log scale
np.logspace(-1, 1, 5)
array([ 0.1 , 0.31622777, 1. , 3.16227766, 10. ])
NumPy 2-D arrays¶
Learning goals
- You will be able to initialize arrays.
- You will be able to index/slice into arrays.
- You will be able to index/slice with logical masks.
- You will be able to compute array statistics.
- You will be able to do array math.
- You will understand broadcasting.
NumPy 2-D array initialization¶
np.zeros([2, 3])
array([[0., 0., 0.],
[0., 0., 0.]])
np.ones([2,3])
array([[1., 1., 1.],
[1., 1., 1.]])
np.random.random([2,3])
array([[0.63464258, 0.80357956, 0.02969486],
[0.94341755, 0.77258912, 0.04859342]])
NumPy 2-D array indexing/slicing¶
# to understand this line
# see broadcasting below
data = np.arange(6).reshape([1,6]) \
+ np.arange(0, 60, 10).reshape([6,1])
data
array([[ 0, 1, 2, 3, 4, 5],
[10, 11, 12, 13, 14, 15],
[20, 21, 22, 23, 24, 25],
[30, 31, 32, 33, 34, 35],
[40, 41, 42, 43, 44, 45],
[50, 51, 52, 53, 54, 55]])
data[2,3]
23
NumPy 2-D array indexing/slicing¶
data[0,3:5]
array([3, 4])
NumPy 2-D array indexing/slicing¶
data[4:,4:]
array([[44, 45],
[54, 55]])
NumPy 2-D array indexing/slicing¶
data[:,2]
array([ 2, 12, 22, 32, 42, 52])
NumPy 2-D array indexing/slicing¶
data[2::2,::2]
array([[20, 22, 24],
[40, 42, 44]])
NumPy 2-D array statistics¶
data = np.array([[1,2],[3,4]])
data
array([[1, 2],
[3, 4]])
data.max()
4
data.max(axis=0)
array([3, 4])
data.max(axis=1)
array([2, 4])
NumPy 2-D array math (basically the same as 1-D)¶
data = np.array([[1,2],[3,4]])
ones = np.ones([2,2])
data, ones
(array([[1, 2],
[3, 4]]),
array([[1., 1.],
[1., 1.]]))
data + ones
array([[2., 3.],
[4., 5.]])
data**2
array([[ 1, 4],
[ 9, 16]])
NumPy 2-D logical indexing (basically the same as 1-D)¶
data = np.array([[1,2],[3,4]])
mask = data < 3
data, mask
(array([[1, 2],
[3, 4]]),
array([[ True, True],
[False, False]]))
data[mask]
array([1, 2])
Broadcasting¶
data = np.array([[1,2,3],[4,5,6]])
row = np.array([[1,2,3]])
col = np.array([[1],[2]])
data + row, data * col
(array([[2, 4, 6],
[5, 7, 9]]),
array([[ 1, 2, 3],
[ 8, 10, 12]]))
Broadcasting¶
row = np.array([[1,2,3]])
col = np.array([[1],[2]])
row + col
array([[2, 3, 4],
[3, 4, 5]])
Broadcasting¶
Matrix multiplication¶
data = np.array([1,2,3])
tens = np.array([[1,10],[2,20],[3,30]])
data, tens
(array([1, 2, 3]),
array([[ 1, 10],
[ 2, 20],
[ 3, 30]]))
data @ tens
array([ 14, 140])
Transpose¶
data = np.array([[1,2,3],[4,5,6]])
data
array([[1, 2, 3],
[4, 5, 6]])
data.T
array([[1, 4],
[2, 5],
[3, 6]])
Reshape¶
data = np.arange(1,7)
data
array([1, 2, 3, 4, 5, 6])
data.reshape(2,3)
array([[1, 2, 3],
[4, 5, 6]])
NumPy N-D arrays¶
Learning goals
- You will be able to work with arrays of any number of dimensions.
- You will be able to index/slice into arrays of any number of dimensions.
- You will appreciate the usefulness of N-D arrays for real data.
- You will understand that each array can only contain a single type of data.
- You will appreciate that NumPy is fast.
NumPy 3-D arrays¶
Array shape¶
Array shape¶
# data is represented in previous image
data = np.arange(1,25).reshape(4,3,2)
data.shape
(4, 3, 2)
rows = data.shape[0]
cols = data.shape[1]
depth = data.shape[2]
rows, cols, depth
(4, 3, 2)
rows, cols, depth = data.shape
rows, cols, depth
(4, 3, 2)
NumPy 3-D array indexing/slicing¶
data[0,2,1]
6
NumPy 3-D array indexing/slicing¶
data[:,-1,:]
array([[ 5, 6],
[11, 12],
[17, 18],
[23, 24]])
data[:,-1]
array([[ 5, 6],
[11, 12],
[17, 18],
[23, 24]])
NumPy 3-D array indexing/slicing¶
data[::2,2,1]
array([ 6, 18])
3-D: EEG time series for multiple channels, trials¶
n_channels = 10
n_time_pts = 500
n_trials = 3
# fake EEGs
EEGs = np.random.random(
[n_channels, n_time_pts, n_trials]
)
EEGs.shape
(10, 500, 3)
# data = channel 1, trial 2
data = EEGs[1,:,2]
data.shape
(500,)
# data = channel 1, all trials
data = EEGs[1,:,:]
data.shape
(500, 3)
# channel 1 trial average
chan1_avg = data.mean(axis=1)
# chan1_avg = EEGs[1,:,:].mean(axis=1)
chan1_avg.shape
(500,)
# data = all channels, trial 1
data = EEGs[:,:,1]
data.shape
(10, 500)
# average EEG across channels for trial 1
trial1_avg = data.mean(axis=0)
# trial1_avg = EEGs[:,:,1].mean(axis=0)
trial1_avg.shape
(500,)
# data = channels 2-4, trial 0,
# first 250 time pts
data = EEGs[2:5,:250,0]
data.shape
(3, 250)
4-D: EEG time series for multiple subjects, channels, trials¶
n_subjects = 15
n_channels = 10
n_time_pts = 500
n_trials = 3
# fake EEGs
EEGs = np.random.random(
[n_subjects, n_channels, n_time_pts, n_trials]
)
EEGs.shape
(15, 10, 500, 3)
# data = everything for subject 7
data = EEGs[7,:,:,:]
data.shape
(10, 500, 3)
# data = everything for subject 7
data = EEGs[7]
data.shape
(10, 500, 3)
# data = all subjects, channel 3, trial 2
data = EEGs[:,3,:,2]
data.shape
(15, 500)
Array data type¶
NumPy arrays must contain data of only a single type (e.g., float, int, bool, etc.)
You cannot mix different types of data in a single array like you can in a Python list.
data = np.random.random([2,3])
data
array([[0.15187631, 0.13323444, 0.27032076],
[0.79030714, 0.96409636, 0.47234137]])
data.dtype
dtype('float64')
data = np.arange(5)
data
array([0, 1, 2, 3, 4])
data.dtype
dtype('int64')
data = np.arange(5).astype(float)
data
array([0., 1., 2., 3., 4.])
data.dtype
dtype('float64')
floats = np.random.random([2,3]) * 10
ints = floats.astype(int)
floats, ints
(array([[0.33768618, 0.63979362, 1.78346574],
[4.93966793, 4.45008176, 0.81904804]]),
array([[0, 0, 1],
[4, 4, 0]]))
np.zeros(3)
array([0., 0., 0.])
np.zeros(3, dtype=float)
array([0., 0., 0.])
np.zeros(3, dtype=int)
array([0, 0, 0])
np.zeros(3, dtype=bool)
array([False, False, False])
NumPy is much faster than basic Python¶
%%timeit
# time this entire cell
tot = 0
for i in range(50000):
tot += i
917 µs ± 9.53 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
# time a single line
%timeit np.arange(50000).sum()
20.1 µs ± 95.2 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)